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Abstract. The cutoff phenomenon describes a case where a Markov 
chain exhibits a sharp transition in its convergence to stationarity. In 
1996, Diaconis surveyed this phenomenon, and asked how one could 
recognize its occurrence in families of finite ergodic Markov chains. In 
2004, the third author noted that a necessary condition for cutoff^ in a 
family of reversible chains is that the product of the mixing-time and 
spectral-gap tends to infinity, and conjectured that in many settings, this 
condition should also be sufficient. Diaconis and Saloff-Coste (2006) 
verified this conjecture for continuous-time birth-and-death chains, sta- 
rted at an endpoint, with convergence measured in separation. It is nat- 
ural to ask whether the conjecture holds for these chains in the more 
widely used total-variation distance. 

In this work, we confirm the above conjecture for all continuous-time 
or lazy discrete-time birth-and-death chains, with convergence measured 
via total-variation distance. Namely, if the product of the mixing-time 
and spectral-gap tends to infinity, the chains exhibit cutoff at the maximal 
hitting time of the stationary distribution median, with a window of at 
most the geometric mean between the relaxation-time and mixing-time. 

In addition, we show that for any lazy (or continuous-time) birth-and- 
death chain with stationary distribution n, the separation 1 - p'{x, y)/n(y) 
is maximized when x,y are the endpoints. Together with the above re- 
sults, this implies that total-variation cutoff is equivalent to separation 
cutoff in any family of such chains. 



1. Introduction 

The cutoff phenomenon arises when a finite Markov chain converges 
abruptly to equilibrium. Roughly, this is the case where, over a negligi- 
ble period of time known as the cutoff window, the distance of the chain 
from the stationary measure drops from near its maximum to near 0. 

Let {Xt) denote an aperiodic irreducible Markov chain on a finite state 
space Q. with transition kernel P{x,y), and let n denote its stationary distri- 
bution. For any two distributions ji, v on Q, their total-variation distance is 
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defined to be 




Consider tlie worst-case total- variation distance to stationarity at time t, 

J(0:=max||P,(X, gO-ttIItv, 

where Pji denotes the probability given Xq = x. The total-variation mixing- 
time of {Xt), denoted by wCe) for < e < 1, is defined to be 

:= min{? : d{t) < s) . 

Next, consider a family of such chains, (X^"^), each with its corresponding 
worst-distance from stationarity d„(t), its mixing-times ?mix, etc. We say that 
this family of chains exhibits cutoff iff the following sharp transition in its 
convergence to stationarity occurs: 

(s) 

lim , ""^ ^ = 1 foranyO<e<l. (1.1) 
4:kl - e) 

Our main result is an essentially tight bound on the difference between 
^MixC^) and ?Mix(l - s) for general birth-and-death chains; a birth-and-death 
chain has the state space {0, . . . , n} for some integer n, and always moves 
from one state to a state adjacent to it (or stays in place). 

We first state a quantitative bound for a single chain, then deduce a cutoff 
criterion. Let gap be the spectral-gap of the chain (that is, gap := I - A 
where A is the largest absolute-value of all nontrivial eigenvalues of the 
transition kernel P), and let := gap~^ denote the relaxation-time of the 
chain. A chain is called lazy if P(x, x) > ^ for all x e Q.. 

Theorem 1. For any < s < ^ there exists an explicit c^ > such that 
every lazy irreducible birth-and-death chain (Xt) satisfies 

^MIx(^) ~ ^Mix(l ~ ^) ^ -^?REL ■ ^Mix(4) ■ (1-2) 

As we later show, the above theorem extends to continuous-time chains, 
as well as to (5-lazy chains, which satisfy P{x, x) > 6 for all x e Q. 

The notion of a cutoff- window relates Theorem 1 to the cutoff phenome- 
non. A sequence w„ is called a cutoff window for a family of chains (Xf'^) 
if the following holds: w„ = o{t'^^(^)), and for any e > there exists some 
Ce > such that, for all n, 

e:^(^)-41(l-e)^W- (1-3) 
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Equivalently, if t„ and w„ are two sequences such that w„ = o{tn), one may 
define that a sequence of chains exhibits cutoff at tn with window w„ iff 

limi-^ooliminf„^oo<i„(?„ - = 1 , 
lim^^oo lim sup^^^o dn{tn + ^vv„) = . 

To go from the first definition to the second, take tn = ?mix(|)- 

Once we compare the forms of (1.2) and (1.3), it becomes clear that 

Theorem 1 implies a bound on the cutoff window for any general family of 

birth-and-death chains, provided that = o{i^ai\))- 

Theorem 1 will be the key to establishing the criterion for total- variation 

cutoff in a general family of birth-and-death chains. 

1.1. Background. The cutoff phenomenon was first identified for the case 
of random transpositions on the symmetric group in [11], and for the case 
of random walks on the hypercube in [1]. It was given its name by Aldous 
and Diaconis in their famous paper [3] from 1985, where they showed that 
the top-in-at-random card shuffling process (repeatedly removing the top 
card and reinserting it to the deck at a random position) has such a behavior. 
Saloff-Coste [25] surveys the cutoff phenomenon for random walks on finite 
groups. 

Though many families of chains are believed to exhibit cutoff, proving 
the occurrence of this phenomenon is often an extremely challenging task, 
hence there are relatively few examples for which cutoff has been rigorously 
shown. In 1996, Diaconis [7] surveyed the cutoff phenomenon, and asked if 
one could determine whether or not it occurs in a given family of aperiodic 
and irreducible finite Markov chains. 

In 2004, the third author [24] observed that a necessary condition for 
cutoff in a family of reversible chains is that the product ?mk(|) • gap(/i) 
tends to infinity with n, or equivalently, = o(?m"x(|)); see Lemma 2.1. 
The third author also conjectured that, in many natural classes of chains. 

Cutoff occurs if and only if = o(?m,x(|)) . (1.4) 

In the general case, this condition does not always imply cutoff : Aldous 
[2] and Pak (private communication via P. Diaconis) have constructed rel- 
evant examples (see also [5], [6] and [21]). This left open the question of 
characterizing the classes of chains for which (1.4) holds. 

One important class is the family of birth-and-death chains; see [10] for 
many natural examples of such chains. They also occur as the magnetiza- 
tion chain of the mean-field Ising Model (see [12], [20]). 

In 2006, Diaconis and Saloff-Coste [10] verified a variant of the conjec- 
ture (1 .4) for birth-and-death chains, when the convergence to stationarity is 
measured in separation, that is, according to the decay of sep(Po(Xi 6 •), 
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where sep(ii,v) = sup^gj^(l - Note that, although sep(jU, v) assumes 
values in [0, 1], it is in fact not a metric (it is not even symmetric). See, e.g., 
[4, Chapter 4] for the connections between mixing-times in total- variation 
and in separation. 

More precisely, it was shown in [10] that any family of continuous-time 
birth-and-death chains, started at 0, exhibits cutoff in separation if and only 
if = o(4ep(|;0)), where t,^p(,s;s) = mm{t : sep(P,(X, 6 ■),n) < s]. The 
proof used a spectral representation of passage times [16, 17] and duality of 
strong stationary times. Whether (1.4) holds with respect to the important 
and widely used total-variation distance, remained unsettled. 

1.2. Total-variation cutoff. In this work, we verify the conjecture (1 .4) for 
arbitrary birth-and-death chains, with the convergence to stationarity mea- 
sured in total-variation distance. Our first result, which is a direct corollary 
of Theorem 1 , establishes this for lazy discrete-time irreducible birth-and- 
death chains. We then derive versions of this result for continuous-time irre- 
ducible birth-and-death chains, as well as for 5-lazy discrete chains (where 
P(x, x) > 6 for all x e Q.). In what follows, we omit the dependence on n 
wherever it is clear from the context. Here and throughout the paper, the 
abbreviation t^^ stands for t^a{\)- 

(n) 

Corollary 2. Let (X] ) be a sequence of lazy irreducible birth-and-death 
chains. Then it exhibits cutoff in total-variation distance iff t*^^ ■ gap(n) 
tends to infinity with n. Furthermore, the cutoff window size is at most the 
geometric mean between the mixing-time and relaxation time. 

As we will later explain, the given bound VW • ^rel for the cutoff win- 
dow is essentially tight, in the following sense. Suppose that the functions 
tMin) and t^in) > 2 denote the mixing-time and relaxation-time of {X^"^), a 
family of irreducible lazy birth-and-death chains. Then there exists a fam- 
ily (yf'^) of such chains with the parameters = (1 + o(1))?m(«) and 
^IeI, = (1 + o{\))tR{n) that has a cutoff window of (/^l ■ fi"!)^^^. In other 
words, no better bound on the cutoff window can be given without exploit- 
ing additional information on the chains. 

Indeed, there are examples where additional attributes of the chain imply 
a cutoff window of order smaller than V^mk • For instance, the cutoff 
window has size for the Ehrenfest urn (see, e.g., [9]) and for the magne- 
tization chain in the mean field Ising Model at high temperature (see [12]). 

Theorem 3.1, given in Section 3, extends Corollary 2 to the case of 5-lazy 
discrete-time chains. We note that this is in fact the setting that corresponds 
to the magnetization chain in the mean-field Ising Model (see, e.g., [20]). 

Following is the continuous-time version of Corollary 2. 
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Theorem 3. Let {a^ ) be a sequence of continuous-time birth-and-death 
chains. Then {X^f^) exhibits cutoff in total-variation ifft^R^L = o(?mk), and the 



By combining our results with those of [10] (while bearing in mind the 
relation between the mixing-times in total- variation and in separation), one 
can relate worst-case total- variation cutoff in any continuous-time family 
of irreducible birth-and-death chains, to cutoff in separation started from 0. 
This suggests that total-variation cutoff should be equivalent to separation 
cutoff in such chains under the original definition of the worst starting point 
(as opposed to fixing the starting point at one of the endpoints). Indeed, 
it turns out that for any lazy or continuous-time birth-and-death chain, the 
separation is always attained by the two endpoints, as formulated by the 
next proposition. 

Proposition 4. Let (Xt) be a lazy ( or continuous -time ) birth-and-death chain 
with stationary distribution n. Then for every integer (resp. real) t > 0, the 
separation 1 - P.v(X? = y) /^iy) is maximized when x, y are the endpoints. 

That is, for such chains, the maximal separation from n at time t is 
simply 1 - P'{0,n)/n{n) (for the lazy chain with transition kernel P) or 
1 - Ht(0,n)/n(n) (for the continuous -time chain with heat kernel Hf). As 
we later show, this implies the following corollary: 

Corollary 5. For any continuous-time family of irreducible birth-and-death 
chains, cutoff in worst-case total-variation distance is equivalent to cutoff 
in worst-case separation. 

Note that, clearly, the above equivalence is in the sense that one cutoff 
implies the other, yet the cutoff locations need not be equal (and sometimes 
indeed are not equal, e.g., the Bernoulli-Laplace models, surveyed in [10, 
Section 7]). 

The rest of this paper is organized as follows. The proofs of Theorem 1 
and Corollary 2 appear in Section 2. Section 3 contains the proofs of the 
variants of Theorem 1 for the continuous-case (Theorem 3) and the 5-lazy 
case. In Section 4, we discuss separation in general birth-and-death chains, 
and provide the proofs of Proposition 4 and Corollary 5. The final section. 
Section 5, is devoted to concluding remarks and open problems. 



cutoff window size is at most JtMa(\) ■ tl 




2. Cutoff in lazy birth-and-death chains 



In this section we prove the main result, which shows that the condition 
gap • ?Mix ^ oo is necessary and sufficient for total-variation cutoff in lazy 
birth-and-death chains. 
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2.1. Proof of Corollary 2. The fact that any family of lazy irreducible 
birth-and-death chains satisfying t^a ■ gap oo exhibits cutoff, follows 
by definition from Theorem 1, as does the bound V^rel • tua on the cutoff" 
window size. 

It remains to show that this condition is necessary for cutoff"; this is known 
to hold for any family of reversible Markov chains, using a straightforward 
and well known lower bound on ?m,x in terms of (cf-, e.g., [21]). We 
include its proof for the sake of completeness. 

Lemma 2.1. Let (Xt) denote a reversible Markov chain, and suppose that 
?REL > 1 + dtyia{\)for some fixed 9 > 0. Then for any < e < 1 

W(£)> W(5)-^log(l/2e). (2.1) 

In particular, ?Mix(e)/W(|) ^ K for all K > and £ < \ exp{-K/6). 

Proof. Let P denote the transition kernel of X, and recall that the fact that 
X is reversible imphes that P is a symmetric operator with respect to (•, 
and 1 is an eigenfunction corresponding to the trivial eigenvalue 1 . 

Let A denote the largest absolute- value of all nontrivial eigenvalues of P, 
and let / be the corresponding eigenfunction, Pf = +Af, normalized to 
have ll/lloo = 1. Finally, let r be the state attaining |/(r)| = 1. Since / is 
orthogonal to 1, it follows that for any t, 

^' = \{P'f){r) - (/, 1)J < max I V P\x,y)f{y) - n(y)f(y) 

yen. 

< ||/|Umax||/'^(x,-)-;r||i = 2max\\P'{x, ■) - nhy . 

xeCl xeSl 

Therefore, for any < e < 1 we have 

log(l/2e) 

W(e) > logi/,(l/2e) > = (4bl - D log(l/2e) , (2.2) 

and (2.1) immediately follows. ■ 
This completes the proof of Corollary 2. ■ 

2.2. Proof of Theorem 1. The essence of proving the theorem lies in the 
treatment of the regime where t^^i^ is much smaller than ?mix(|)- 

Theorem 2.2. Let (X,) denote a lazy irreducible birth-and-death chain, and 
suppose that W < ■ W(|)/o?' some < s < j^. Then 

?Mix(4£) - fMix(l - 2e) < (6/e) yJt~Z^) ■ 

Proof of Theorem 1. To prove Theorem 1 from Theorem 2.2, let e > 0, 
and suppose first that < • tuai^)- If e < ^, then the above theorem 
clearly implies that (1.2) holds for q = 24/ s. Since that the left-hand-side 
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of (1.2) is monotone decreasing in s, this result extends to any value of 
e < ^ by choosing 

ci = ci(e) = 24max{l/e,64} . 

It remains to treat the case where ?rel ^ £^ ■ In this case, the sub- 

multiplicativity of the mixing-time (see, e.g., [4, Chapter 2]) gives 

W(e) < W(i)r^ log2(l/e)l for any < £ < i . (2.3) 

In particular, for e < ^ our assumption on t^EL gives 

W(e) - W(l - e) < W(e) < s~^'^ log2(l/e) ^4el • ^(5) ■ 

Therefore, a choice of 

C2 = C2(£) = max{log2(l/e)/£'/',64} 

gives (1.2) for any s < ^ (the case £ > \ again follows from monotonicity). 
Altogether, a choice of = max{ci, C2} completes the proof. ■ 

In the remainder of this section, we provide the proof of Theorem 2.2. To 
this end, we must first establish several lemmas. 

Let X = X(t) be the given (lazy irreducible) birth- and-death chain, and 
from now on, let Q„ = {0, . . . ,n} denote its state space. Let P denote the 
transition kernel of X, and let n denote its stationary distribution. Our first 
argument relates the mixing-time of the chain, starting from various starting 
positions, with its hitting time from to certain quantile states, defined next. 

k 

Q{s) := min \k : ^ 7r(j) > e} , where < £ < 1 . (2.4) 

;=o 

Similarly, one may define the hitting times from n as foUows: 

n 

Q{s) := max \k : ^ n{j) > e} , where < £ < 1 . (2.5) 

Remark. Throughout the proof, we will occasionally need to shift from Q{s) 
to 2(1 - e), and vice versa. Though the proof can be written in terms of 
Q, Q, for the sake of simplicity it will be easier to have the symmetry 

2(e) = 2(1 - for almost any e > . (2.6) 

This is easily achieved by noticing that at most n values of s do not satisfy 
(2.6) for a given chain X(t) on n states. Hence, for any given countable 
family of chains, we can eliminate a countable set of all such problematic 
values of s and obtain the above mentioned symmetry. 
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Recalling that we defined P*^ to be the probability on the event that the 
starting position is k, we define E^; and Var^ analogously. Finally, here and 
in what follows, let Tk denote the hitting-time of the state k, that is, Tk '■= 
mm{t : X(t) = k}. 

Lemma 2.3. For any fixed < e < 1 and lazy irreducible birth-and-death 
chain X, the following holds for any t: 

WnO, ■) - ;r||Tv < Po(Te(i-.) >t) + s, (2.7) 

and for all k e Q„, 

WP'ik, •) - ttIItv < PA(max{Te(,), tq^,.,)} >t) + 2s. (2.8) 

Proof. Let X denote an instance of the lazy birth-and-death chain starting 
from a given state k, and let X denote another instance of the lazy chain 
starting from the stationary distribution. Consider the following no-crossing 
coupling of these two chains: at each step, a fair coin toss decides which 
of the two chains moves according to its original (non-lazy) rule. Clearly, 
this coupling does not allow the two chains to cross one another without 
sharing the same state first (hence the name for the coupling). Furthermore, 
notice that by definition, each of the two chains, given the number of coin 
tosses that went its way, is independent of the other chain. Finally, for any 
t, X{t), given the number of coin tosses that went its way until time t, has 
the stationary distribution. 

In order to deduce the mixing-times bounds, we show an upper bound on 
the time it takes X and X to coalesce. Consider the hitting time of X from 
to 2(1 - e), denoted by Tg(i_e). By the above argument, X{tq(i-e)) enjoys 
the stationary distribution, hence by the definition of Q{1 - e), 

P(l(Te(i_,))<X(re(i_,)))> 1-e. 

Therefore, by the property of the no-crossing coupling, X and X must have 
coalesced by time Tg(i_e) with probability at least I - s. This implies (2.7), 
and it remains to prove (2.8). Notice that the above argument involving the 
no-crossing coupling, this time with X starting from k, gives 

P(l(Te(,))>X(re(,)))> 1-e, 

and similarly, 

P(l(Tg(i_,))<X(re(i_,)))> 1-e. 

Therefore, the probability that X and X coalesce between the times Tg(e) and 
TQ(i-e) is at least 1 - 2s, completing the proof. ■ 

Corollary 2.4. Let X{t) be a lazy irreducible birth-and-death chain on Qn- 
The following holds for any < s < j^: 

W(|) < 16max{EoTe(i_£),E„T2(£)} . (2.9) 
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Proof. Clearly, for any source and target states x,y e Q„, at least one of the 
endpoints s 6 {0, n) satisfies E^Ty > 'E^Ty (by the definition of the birth-and- 
death chain). Therefore, if T denotes the right-hand-side of (2.9), then 

P,-(max{Te(,),Te(i.,)} >T)< ^,{tq(s) > T) + P,(Te(i_,) > 7^) < ^ , 

where the last transition is by Markov's inequality. The proof now follows 
directly from (2.8). ■ 

Remark. The above corollary shows that the order of the mixing time is at 
most max{EoTQ(i_e),E„Tg(£)}. It is in fact already possible (and not difficult) 
to show that the mixing time has this order precisely. However, our proof 
only uses the order of the mixing-time as an upper-bound, in order to finally 
deduce a stronger result: this mixing-time is asymptotically equal to the 
above maximum of the expected hitting times. 

Having established that the order of the mixing-time is at most the ex- 
pected hitting time of Q(l - s) and Q(s) from the two endpoints of Q.n, 
assume here and in what follows, without loss of generality, that EoTg(i_e) 
is at least E„tq(e). Thus, (2.9) gives 

?Mix(i) < 16 • EoTq(i_s) for any < e < ^ . (2.10) 

A key element in our estimation is a result of Karlin and McGregor 
[16, Equation (45)], reproved by Keilson [17], which represents hitting- 
times for birth- and-death chains in continuous -time as a sum of indepen- 
dent exponential variables (see [13], [9], [14] for more on this result). The 
discrete-time version of this result was given by Fill [13, Theorem 1.2]. 

Theorem 2.5 ([13]). Consider a discrete-time birth-and-death chain with 
transition kernel P on the state space {0, . . . , J} started at 0. Suppose that 
d is an absorbing state, and suppose that the other birth probabilities pu 
< i < d — \, and death probabilities qu I < i < d — \, are positive. Then 
the absorption time in state d has probability generating function 

n'^ \(1 - Oj)u^ 

where —\<9j<\ are the d non-unit eigenvalues of P. Furthermore, if 
P has nonnegative eigenvalues then the absorption time in state d is dis- 
tributed as the sum of d independent geometric random variables whose 
failure probabilities are the non-unit eigenvalues of P. 

The above theorem provides means of establishing the concentration of 
the passage time from left to right of a chain, where the target (right end) 
state is turned into an absorbing state. Since we are interested in the hitting 
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time from one end to a given state (namely, from to 2(1 - e)), it is clearly 
equivalent to consider the chain where the target state is absorbing. We 
thus turn to handle the hitting time of an absorbing end of a chain starting 
from the other end. The following lemma will infer its concentration from 
Theorem 2.5. 

Lemma 2.6. Let X(t) be a lazy irreducible birth-and-death chain on the 
state space {0, . . . , d], where d is an absorbing state, and let gap denote its 
spectral gap. Then Varo tj < (Eot^) /gap. 

Proof. Let 6q > . . . > O^-i denote the d non-unit eigenvalues of the tran- 
sition kernel of X. Recalling that X is a lazy irreducible birth-and-death 
chain, 6*, > for all z, hence the second part of Theorem 2.5 implies that 
Td ~ 2f=o where the F,-s are independent geometric random variables 
with means 1/(1 - 0i). Therefore, 

^«^^ = Zr^' v^°^^ = Z7A7' (2.12) 

i=0 ' i=0 

which, using the fact that 6q > 6i for all z, gives 

^1^1 EoTrf 

Varo Td < 



6*0 ^ 1 - di gap 



as required. 



As we stated before, the hitting time of a state in our original chain has 
the same distribution as the hitting time in the modified chain (where this 
state is set to be an absorbing state). In order to derive concentration from 
the above lemma, all that remains is to relate the spectral gaps of these two 
chains. This is achieved by the next lemma. 

Lemma 2.7. Let X(t) be a lazy irreducible birth-and-death chain, and gap 
be its spectral gap. Set < e < 1, and let i = Q{\ - e). Consider the mod- 
ified chain Y{t), where t is turned into an absorbing state, and let gap|[o/] 
denote its spectral gap. Then gap|[o/] > e ■ gap. 

Proof. By [4, Chapter 3, Section 6], we have 

. <(/ - p)f, f\ . 1 Zij im - fu)f Piu mi) 

gap = mm = mm — . 

/ : E./=0 </, f)^ f : E./=0 2 Z/ /(0'^(0 

/■*0 /SO 

(2.13) 

Observe that gap|[o/] is precisely 1-/1, where X is the largest eigenvalue 
of P\t, the principal sub-matrix on the first £ rows and columns, indexed by 
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{0,. . . ,£ - 1} (notice that this sub-matrix is strictly sub-stochastic, as X is 
irreducible). Being a birth-and-death chain, X is reversible, that is, 

Pijnii) = Pjinij) for any i, j . 

Therefore, it is simple to verify that P\( is a symmetric operator on with 
respect to the inner-product {■,■)„; that is, {P\ex,y)^ = {x,P\(y)j^ for every 
x,y e R^, and hence the Rayleigh-Ritz formula holds (cf., e.g., [15]), giving 

{P\ex,x)„ 
A = max — — . 

xeW {X, X)„ 

It follows that 

Z "=0 [m - Z "=0 P(h j)fU)) fdMi) 

gaplrofi = 1 - A = min — 

f(k)=o \ik>e 

. 1 Zo<,,;<„ (/(o - fu)f Piu mi) , , , 

= mm , (2.14) 

f(k)=0 Vfef 

where the last equality is by the fact that P is stochastic. 

Observe that (2.13) and (2.14) have similar forms, and for any / (which 
can also be treated as a random variable) we can write f = f - B;^/ such 
that eJ = 0. Clearly, 

(f(i) - fU)f P(h = (f(i) - fU)?P(U jXO, 

hence in order to compare gap and gap|[o/], it will suffice to compare the 
denominators of (2.13) and (2.14). Noticing that 

Var,(/) = 2 fUMi) , and EJ^ = f(i)Mi) , 

i i 

we wish to bound the ratio between the above two terms. Without loss of 
generality, assume that E„f = 1. Then every / with f(k) = for aWk > £ 
satisfies 

n{f ^ 0) 
and hence 



J. [/ I / ^ O] > (E, [/ I / ^ 0])' = {n{f^ 0))- 



^<;r(/^0)<l-e, (2.15) 

where the last inequality is by the definition of £ as 2(1 - e). Once again, 
using the fact that E„f = 1 , we deduce that 
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Altogether, by the above discussion on the comparison between (2.13) and 
(2.14), we conclude that gap|[o/] > e • gap. ■ 

Combining Lemma 2.6 and Lemma 2.7 yields the following corollary: 

Corollary 2.8. Let X{t) be a lazy irreducible birth-and-death chain on n„, 
let gap denote its spectral-gap, and < £ < 1. The following holds: 

VaroTe(i.,)<?^^^^. (2.17) 
s- gap 

Remark. The above corollary implies the following statement: whenever 
gap • EoTg(i_e) oo with n, the hitting-time Tq(i^e) is concentrated, as 
Varo Tg(i_£) = o((EoTq(i_£))^). This is essentially the case under the assump- 
tions of Theorem 2.2 (which include a lower bound on gap • ?mix(|) in terms 
of e), as we already established in (2.10) that EoTQ(i-e) > j^?mix(^)- 

Recalling the definition of cutoff and the relation between the mixing 
time and hitting times of the quantile states, we expect that the behaviors 
of TQ(e) and T2(i-£) would be roughly the same; this is formulated in the 
following lemma. 

Lemma 2.9. Let X{t) be a lazy irreducible birth-and-death chain on Q.n, 
and suppose that for some Q < s < ^we have ?rel < ■ EoTQ(i-e). Then for 
any fixed s<a<fi<l-s: 

3_ 
2e 

Proof. Since by definition, Eg(e)TQ(i-£) > Eg(Q,)TQ(/j) (the left-hand- side can 
be written as a sum of three independent hitting times, one of which being 
the right-hand-side), it suffices to show (2.18) holds for a = s and /3 = \-e. 

Consider the random variable v, distributed according to the restriction 
of the stationary distribution n to [2(e)] := {0, . . . , Q{s)\, that is: 



Q(a)'^Q(fi) ^ ^4el • E()Tg(i) . (2.18) 



n(lc) 

and let w e E.^" denote the vector w := l{[2(e)]) /^([Q{s)])- As X is reversible, 
the following holds for any k: 

P'{v, k) = Y^ P\i, k)n(i)w(i) = {P'w){k) ■ n{k) . 

i 

Thus, by the definition of the total- variation distance (for a finite space): 

1 " 1 
\\P\v, •) - 7r(-)llTv = 2 E""^^^ \{P'w)^k) - l| = 2^\P\w - DIIl'W 

< h\P\w - 1)|L2(,) , 
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where the last inequality follows from the Cauchy-Schwartz inequality. As 
w - 1 is orthogonal to 1 in the inner-product space (•, we deduce that 

ir(w-i)ib(,)<4iiw-iib(,), 

where X2 is the second largest eigenvalue of P. Therefore, 

\\p\y, ■) - ^(oiItv < ;r4ii>^ - niHn) = ^4 V(iM[e(e)]))-i < , 

2. I 2 -ye 

where the last inequality is by the fact that n{{Q{sy\) > e (by definition). 
Recalling that ?rel = gap"' = 1/(1 - ^2), define 

t, = [|log(l/e)4,,] , 
and notice that, a& e < and ?rel > 1, we have ^ log(l ls)t^^i^ > 1, and so 

t, < 2l0g(l/£)?REL • 

Since log(l/jc) > 1 - x for all x 6 (0, 1], it follows that A*^ < thus 

\\P\v,-)-n{-)\\^^<sl2. (2.20) 

We will next use a second moment argument to obtain an upper bound 
on the expected commute time. By (2.20) and the definition of the total- 
variation distance, 

Pv(Te(i-.) <h)>s- ||P^=(y, •) - ;r(OllTv > s/l , 

whereas the definition of v as being supported by the range [2(e)] gives 

»^ /^\/» t ^ ^ \ ^ re(i_e) 

Pe(e)'''e(l-e) ~ 

Combining the two. 



2 

'■'Q{£)'^Q{l-E) ^ + -J - Varg(e) Tg(i_e). (2.21) 

Recall that starting from 0, the hitting time to point Q{1 - e) is exactly the 
sum of the hitting time from to Q{e) and the hitting time from Q{e) to 
2(1 - e), where both these hitting times are independent. Therefore, 

Varg(£) TQ(i-e) < Varo tqh-e) . (2.22) 

By (2.21) and (2.22) we get 



2 

ie(e)Te(l-e) < + a/ - Varo Tg(i-£) 



s 



< 21og(l/£)r,EL + (1/e) V2W ■ EoTe(i-£) , (2.23) 
where the last inequality is by Corollary 2.8. 
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We now wish to rewrite the bound (2.23) in terms of 4el • ^o^Qi^) using 
our assumptions on t^^L and EqTq^I). First, twice plugging in the fact that 

^REL 

< e"* • EoTQd-e) yields 

Ee(e)T2(i-£) < {le^ log(l/e) + V2) e • EoTgd-e) 

< |e ■ EoTQa-s) , (2.24) 
where in the last inequality we used the fact that £ < ^. In particular, 

EoTq(1-£) < EoTg(i) + Eg(£)TQ(i_e) < EqTq^I) + |e • EoTq(i_£) , 

and after rearranging, 

EoTgd.,) < (EoTg( 1)) / (1 - Is) . (2.25) 
Plugging this result back in (2.23), we deduce that 



1 PW • EoT 
EQ(e)TQ(i-e) < 21og(l/e) • W + - ^^ j — — • 

A final application of the fact ?rel < ■ EoTQ(i-e), together with (2.25) and 
the fact that e < ^, gives 

^(2sHog{l/s)+ ^^2/s\ r = 

< ^ 7^RBE • EoTg(:) , (2.26) 

as required. ■ 

We are now ready to prove the main theorem. 
Proof of Theorem 2.2. Recall our assumption (without loss of generality) 

EoTq(i_£) > E„Tg(£) , (2.27) 
and define what would be two ends of the cutoff window: 

r = r(r) := [ EoTg(i) -t^^rel •EoTg(i) J , 

f+ = t^{y) ■= [ EoTg(,) + y ^4el • EoTg(i) ] . 

For the lower bound, let < e < -j^; combining (2.10) with the assumption 
that ?REL < • ?M.x(|) gives 

4el < 16e^ • EoTg(i_e) < ■ EoTq(i_£) . (2.28) 
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Thus, we may apply Lemma 2.9 to get 
Furthermore, recalling Corollary 2.8, we also have 

Varo T g(e) < ^ _ ^ 4el • EqT Q{e) < 2^ • EqT g( I ) . 

Therefore, by Chebyshev's inequality, the following holds for any y > ^ ■ 



\P' (,0, •) - ttIItv > 1 - e - Po(Te(,) < r) > 1 - e - 2 |r - ^ j 



2£- 

-2 



and a choice of y = 2/e implies that (with room to spare, as e < ^) 

W(l - 2e) > EoTg(i) - (2/e) ,Jt~E^^^ . (2.29) 
The upper bound will follow from a similar argument. Take < e < -jr 



and recall that ?rel < s'^ ■ EoTq(i_£). Applying Corollary 2.8 and Lemma 2.9 
once more (with (2.25) as well as (2.27) in mind) yields: 

3 I 

E„TQ(e) < EorQ(i-£) < EoTg(i) + — y?REL ■ EoTg(i) , (2.30) 

^REL ■ Eo'''Q(i) 

Varore(i-e) < (l/e)4EL • Eor^d^e) < — ^— ^ < {2ls)h^i, ■ EoTg(i) , 

Var„ TQ(e) < (l/e)?REL • E„tq(£) < (2/e)W • EoTg(i, 



and y > ^ 



Hence, combining Chebyshev's inequality with (2.8) implies that for all k 

3_ 
2e' 

\\P'\k, ■) - ;r||Tv <2s + PoirQu-s) > ^ + P„(Te(.) > 
<2£+-\y 



Is 



Choosing 7 = ^ we therefore get (with room to spare) 

W(4e) < [ EoTe(,) + — ■ Eore(i) ] . 

Note that, 2(^) > 0, since otherwise EoTq(^) = and thus (2.30) would 
imply that E„Tg(e) = EoTg(i_£) = 0. Indeed, in that case, we would get 
2(1 - e) = and yet Q{s) = n, and therefore n = 0, turning the statement 
of the theorem to be trivially true. It follows that ?rel ■ EoTg(^) > 1, and 
combining this with the fact that e < we conclude that 

fMix(4£) < EoTe(.) + (3/e) • EoTg( 1 ) . (2.31) 
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We have thus estabUshed the cutoff window in terms of ^rel and EoTq( i ), and 
it remains to write it in terms of t^^L and ?mix- To this end, recall that (2.25) 
implies that 



< e ■ EoTg(i_e) < j-EoTg( 1 ) , 

hence (2.29) gives the following for any £ < j^'- 

m) > (l - -r^) • EoTe(.) > ^EoTe(>) . (2.32) 

Altogether, (2.29), (2.31) and (2.32) give 

?Mix(4e) - ?Mix(l - 2e) < (5/e) ^t^^^ ■ EoTg(i) < (6/e) • , 

completing the proof of the theorem. ■ 

2.3. Tightness of the bound on the cutoff window. The bound VW • ^rel 
on the size of the cutoff window, given in Corollary 2, is essentially tight in 
the following sense. Suppose that ?m(") and ?«(«) > 2 are the mixing-time 
^Mix(|) and relaxation-time of a family (Xf'^) of lazy irreducible birth- 
and-death chains that exhibits cutoff. For any fixed s > 0, we construct a 
family {Y\'^^) of such chains satisfying 

(1 - s)tM < ?Mix(i) < (1 + e)?M , ^2 33) 

I^REL 

and in addition, having a cutoff window of size (4"ix ■ ^Iel)'^^- 

Our construction is as follows: we first choose n reals in [0, 1), which 
would serve as the nontrivial eigenvalues of our chain: any such sequence 
can be realized as the nontrivial eigenvalues of a birth- and-death chain with 
death probabilities all zero, and an absorbing state at n. Our choice of eigen- 
values will be such that = + o{\))tM, ^Iel = \tR and the chain will 
exhibit cutoff with a window of • tR. Finally, we perturb the chain to 
make it irreducible, and consider its lazy version to obtain (2.33). 

First, notice that Ir = o(tM) (a necessary condition for the cutoff, as 
given by Corollary 2). Second, if a family of chains has mixing-time and 
relaxation-time tM and tR respectively, then the cutoff point is without loss 
of generality the expected hitting time from to some state m (namely, for 
m = 2(^)); let h,„ denote this expected hitting time. Theorem 2.5 gives 

h„, = EoT,„ < EqT,, <n-tR. 

Setting e > 0, we may assume that Ir > 2(1 -I- e) (since Ir > 2, and 
a small additive error is permitted in (2.33)). Set K = ^h,„/tR, and define 
the following sequence of eigenvalues {Ai}: the first L^J eigenvalues will be 



s 



.4 
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equal to A := 1 -2/tR, and the remaining eigenvalues will all have the value 
A', such that the sum 2"=i 1/(1 - '^d equals ^h^ (our choice of K and the 
fact that hm < ntR ensures that A' < A). By Theorem 2.5, the birth-and-death 
chain with absorbing state in n which realizes these eigenvalues satisfies: 

' 4« = (l+o(l))Eor„ = (i+o(l)yM, 
>) _ If 

'rel 2 ^ ' 

Varo Tn > VK\j^ > gjf^^M • tR , 

where in the last inequality we merely considered the contribution of the 
first [^J geometric random variables to the variance. Continuing to focus 
on the sum of these L^J i-i-d. random variables, and recalling that ^ ^ oo 
with n (by the assumption Ir = o{tM)), the Central-Limit- Theorem implies 
that 

lim Po(t„ - EoT„ > y ■ > c(y, e) > for any y > . 

Hence, the cutoff window of this chain has order at least y/tM ■ tR. 

Clearly, perturbing the transition kernel to have all death-probabilities 
equal some s' (giving an irreducible chain), shifts every eigenvalue by at 
most s' (note that t„ from has the same distribution if n is an absorbing 
state). Finally, the lazy version of this chain has twice the values of Eot„ 
and ?RiiL, giving the required result (2.33). 

3. Continuous-time chains and 5-lazy discrete-time chains 

In this section, we discuss the versions of Corollary 2 (and Theorem 
2.2) for the cases of either continuous-time chains (Theorem 3), or 5-lazy 
discrete-time chains (Theorem 3.1). Since the proofs of these versions fol- 
low the original arguments almost entirely, we describe only the modifica- 
tions required in the new settings. 

3.1. Continuous-time birth-and-death chains. In order to prove Theo- 
rem 3, recall the definition of the heat-kernel of a continuous-time chain 
as Ht{x,y) := ^x{^t = y), rewritten in matrix-representation as Ht = 
(where P is the transition kernel of the chain). 

It is well known (and easy) that if Ht, Ht are the heat-kernels correspond- 
ing to the continuous-time chain and the lazy continuous-time chain, then 
Ht = H2t for any t. This follows immediately from the next simple and 
well-known matrix-exponentiation argument shows: 

H, = o'^''-'^ = o''^'i'-'^ = H2t. (3.1) 

Hence, it suffices to show cutoff for the lazy continuous-time chains. We 
therefore need to simply adjust the original proof dealing with lazy irre- 
ducible chains, from the discrete-time case to the continuous-time case. 
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The first modification is in the proof of Lemma 2.3, where a no-crossing 
coupling was constructed for the discrete-time chain. Clearly, no such cou- 
pling is required for the continuous case, as the event that the two chains 
cross one another at precisely the same time now has probability 0. 

To complete the proof, one must show that the statement of Corollary 
2.8 still holds; indeed, this follows from the fact that the hitting time Tg(i_e) 
of the discrete-time chain is concentrated, combined with the concentration 
of the sum of the exponential variables that determine the timescale of the 
continuous-time chain. 

3.2. Discrete-time 5-lazy birth- and-death chains. 

Theorem 3.1. Let {X'f^) be a family of discrete-time 6-lazy birth- and- death 
chains, for some fixed 6 > 0. Then (X^"'') exhibits cutoff in total-variation iff 



Proof. In order to extend Theorem 2.2 and Corollary 2 to 5-lazy chains, 
notice that there are precisely two locations where their proof rely on the 
fact that the chain is lazy. The first location is the construction of the no- 
crossing coupling in the proof of Lemma 2.3. The second location is the 
fact that all eigenvalues are non-negative in the application of Theorem 2.5. 

Though we can no longer construct a no-crossing coupling. Lemma 2.3 
can be mended as follows: recalling that P(x, x) > S for all x 6 Q.,,, define 
P' = jzgiP ~ 51), and notice that P' and P share the same stationary distri- 
bution (and hence define the same quantile states Q{e) and 2(1 - e) on 
Let X' denote a chain which has the transition kernel P' , and X denote its 
coupled appropriate lazy version: the number of steps it takes X to perform 
the corresponding move of X' is an independent geometric random variable 
with mean 1/(1 - 6). 

Set p = I - 6{l - 2s), and condition on the path of the chain X', from the 
starting point and until this chain completes T = [logp e] rounds from Q{s) 
to 2(1 - s), back and forth. As argued before, as X follows this path, upon 
completion of each commute time from Q{s) to 2(1 - s) and back, it has 
probability at least 1 - 2e to cross X. Hence, by definition, in each such trip 
there is a probability of at least 6(1 - 2s) that X and X coalesce. Crucially, 
these events are independent, since we pre-conditioned on the trajectory of 
X'. Thus, after T such trips, the X and X have a probability of at least \ - s 
to meet, as required. 

It remains to argue that the expressions for the expectation and variance 
of the hitting-times, which were derived from Theorem 2.5, remain un- 
changed when moving from the ^-lazy setting to 5-lazy chains. Indeed, 
this follows directly from the expression for the probability-generating- 
f unction, as given in (2.1 1). ■ 




^(^Mix)- the cutoff window size is at most Jt^a(^) ■ tl 
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4. Separation in birth-and-death chains 



In this section, we provides the proofs for Proposition 4 and Corollary 5. 

Let (Xt) be an ergodic birth-and-death chain on Q,, = {0, . . . , n}, with a 
transition kernel P and stationary distribution n. Let dsep{t', x) denote the 
separation of X, started from x, from n, that is 

d,cp(t;x) := max(l - P'{x,y)/n(y)) . 

yen,, 

According to this notation, Jsep(0 := max^cen,, ^^sep(^; x) measures separation 
from the worst starting position. 

The chain X is called monotone iff /',,,+ 1 -i- Pmj < 1 for all i < n. It is 
well known (and easy to show) that if X is monotone, then the likelihood 
ratio P'{0, k)/n{k) is monotone decreasing in k (see, e.g., [8]). An immediate 
corollary of this fact is that the separation of such a chain from the stationary 
distribution is the same for the two starting points {0,n}. We provide the 
proof of this simple fact for completeness. 

Lemma 4.1. Let P be the transition kernel of a monotone birth-and-death 
chain on Q„ = {0, . . . , n). If f : f2„ ^ R /5 a monotone increasing (decreas- 
ing) function, so is Pf. In particular, 



Proof. Let {/?, }, {^, } and {r, } denote the birth, death and holding probabilities 
of the chain respectively, and for convenience, let f{x) be for any x iO-n- 
Assume without loss of generality that / is increasing (otherwise, one may 
consider -/). In this case, the following holds for every < x < n: 



Pf(x + 1) = qx+if(x) + r,^if{x + 1) + Px^if{x + 2) 
> q^^J{x) + (1 - q,^i)f{x + 1) . 

Therefore, by the monotonicity of / and the fact that Px+qx+\ < 1 we obtain 
that Pf{x) < Pf{x -I- 1), as required. 

Finally, the monotonicity of the chain implies that P'{-,0) is monotone 
decreasing for t = \, hence the above argument immediately implies that 
this is the case for any integer t >l. ■ 

By reversibility, the following holds for any monotone birth-and-death 
chain with transition kernel P and stationary distribution n: 



P'(k, 0) > P'{k + 1,0) foranyt>OandO <k <n . 



(4.1) 



Pf(x) = qj{x - 1) + rj{x) + p j{x + 1) 
< (1 - px)f{x) + pj{x + 1) , 



and 



PU 



'XO,k) P'iO,k+l) 
n{k) ~ n{k + 1) 



for any t >Q and < ^ < n . 



(4.2) 
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Figure 1. A monotone irreducible birth-and-death chain 
where worst separation may not involve the endpoints. Edge 
weights denote the conductances (see Example 4.3). 

In particular, the maximum of 1 - P\0, j)/7T{j) is attained at j = n, and the 
separation is precisely the same when starting at either of the two endpoints: 

Corollary 4.2. Let (X,) be a monotone irreducible birth-and-death chain 
on = {0, . . . , n} with transition kernel P and stationary distribution n. 
Then for any integer t, 

sep(F(0, ■),n) = 1 -— = 1 -— = sep(F(«, ■ 

n{n) 7r(0) 

Since lazy chains are a special case of monotone chains, the relation (3.1) 
between lazy and non-lazy continuous-time chains gives an analogous state- 
ment for continuous-time irreducible birth-and-death chains. That is, for 
any real t > 0, 

Ht{0,n) Ht(n,0) 

sepiHtiO, ■),n) = 1 -— = 1 — — = sep(//,(n, ■),7t) , 

n{n) n{0) 

where //, is the heat-kernel of the chain, and n is its stationary distribution. 

Unfortunately, when attempting to generalize Lemma 4. 1 (and Corollary 
4.2) to an arbitrary starting point, one finds that it is no longer the case 
that the worst separation involves one of the endpoints, even if the chain is 
monotone and irreducible. This is demonstrated next. 

Example 4.3. Let P and n denote the transition kernel and stationary distri- 
bution of the birth-and-death chain on the state space = {0, 1,2, 3}, given 
in Figure 1 . It is easy to verify that this chain is monotone and irreducible, 
and furthermore, that the following holds: 

min ^ ^^'^^ is attained solely aty = 2, 
ysQ.3 7r(y) 

P^(x,y) 

min — is attained solely at x = y = 1 . 

j.-,ven3 7r(y) 

Thus, when starting from an interior point, the worst separation might not 
be attained by an endpoint, and in addition, the overall worst separation may 
not involve the endpoints at all. 
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However, as we next show, once we replace the monotonicity require- 
ment with the stricter assumption that the chain is lazy, it turns out that the 
above phenomenon can no longer occur. 

The approach that led to the following result relied on maximal couplings 
(see, e.g., [19], [22] and [18], and also [23, Chapter III.3]). We provide a 
straightforward proof for it, based on an inductive argument. 

Lemma 4.4. Let P be the transition kernel of a lazy birth-and-death chain. 
Then for any unimodal non-negative f : Q. ^ W*', the function Pf is also 
unimodal. In particular, for any integer t, all columns ofP' are unimodal. 

Proof. Let {p,}, {^,} and {r,} be the birth, death and holding probabilities of 
the chain respectively, and for convenience, define /(z) to be for z e N \ 
Let m e Q be a state achieving the global maximum of /, and set g = Pf. 
For every < x < m, the unimodality of / implies that 

g(x) = qj{x - 1) + rjix) + pj{x + 1) 
>qJ{x-\) + {\-q^)f{x), 

and similarly, 

g{x-\) = qx-\f{x - 2) + ry,_if{x - 1) + Px-ifix) 

< (1 - px-i)fix - 1) + p,-ifix) . 

Therefore, by the monotonicity of the chain, we deduce that g(x) > g(x- 1). 
The same argument shows that for every m < y < nwe have g{y) > g(y +1). 

As g is increasing on {0, . . . , m - 1} and decreasing on (m + 1, . . . , n}, 
unimodality will follow from showing that g{m) > min {g(m - 1), g(m +1)} 
(the global maximum of g would then be attained at m' e {m - 1 , m, m + 1 }). 
To this end, assume without loss of generality that f(m - 1) > f{m + 1). The 
following holds: 

gim) = q,nf{m - 1) + r„J{m) + p,nf{m + 1) 

> r„,f{m) + (1 - r,„)/(m + 1) , 

and 

g{m + 1) = qm+ifOn) + r,„+J(m + 1) + p,n+if(m + 2) 
< qm+ifim) + (1 - qni+\)f{m + 1) . 
Thus, the laziness of the chain implies that g{m) > g{m-\-l), as required. ■ 
By reversibility. Lemma 4.4 has the following corollary: 

Corollary 4.5. Let (Xt) be a lazy and irreducible birth-and-death chain on 
the state space Q.n = {0, . . . ,n}, with transition kernel P and stationary 
distribution n. Then for any * e ^2„ and any integer t > 0, the function 
f(x) := P'(s,x)/n(x) is unimodal. 
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Remark. The maximum of the unimodal function f(x) in Corollary 4.5 need 
not be located a.tx = s, the starting point of the chain. This can be demon- 
strated, e.g., by the biased random walk. 

Proposition 4 will immediately follow from the above results. 

Proof of Proposition 4. We begin with the case where {X,) is a lazy birth- 
and-death chain, with transition kernel P. Let 5 e Q„ be a starting position 
which maximizes d^epit). Then by Corollary 4.5, (isep(0 is either equal to 
1 - P'(s, 0)/n(0) or to 1 - P'{s, n)/7T(n). Consider the first case (the second 
case is treated by the exact same argument); by reversibility, 

d,cp{t) = 1 -— < 1 -— , 

n{s) n{n) 

where the last inequality is by Lemma 4.1. Therefore, the endpoints of X 
assume the worst separation distance at every time t. 

To show that (^sep(0 = 1 -HtiO, n)/n(n) in the continuous-time case, recall 
that 

H,(x,y) = P,(X, =y)=E = J] P'(x,ymNr = k) , 

k 

where P is the transition kernel of the corresponding discrete-time chain, 
and Nt is a Poisson random variable with mean t. Though P'' has unimodal 
columns for any integer k, a linear combination of the matrices does not 
necessarily maintain this property. We therefore consider a variant of the 
process, where A^, is approximated by an appropriate binomial variable. 

Fix t > 0, and for any integer m > It let N[{m) be a binomial random 
variable with parameters Bin(m, f/m). Since N[{m) converges in distribution 
to A^;, it follows that H[{m) := E converges to as m ^ oo. Writing 

N't{m) as a sum of independent indicators {5, : / = 1, . . . , m} with success 
probabilities f/m, and letting Q:= {\ - + ^P, we have 



if;(m) = E[p2:;=.5'] = 2" 



Note that for every m > 2t, the transition kernel Q corresponds to a lazy 
birth-and-death chain, thus Lemma 4.4 ensures that H[{m) has unimodal 
columns for every such m. In particular, //, = lim„,^oo H't{m) has unimodal 
columns. This completes the proof. ■ 

Proof of Corollary 5. By Theorem 3, total-variation cutoff (from the worst 
starting position) occurs iff 4el = o(?mix(|))- Combining Proposition 4 with 
[10, Theorem 5.1] we deduce that separation cutoff (from the worst starting 
point) occurs if and only if = o(?sep(|)) (where t^^pi^) = max^ ?sep(£; ^) 
is the minimum t such that max^^ %Q^{Ht{x, ■), n) < e). 
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Therefore, the proof will follow from the well known fact that ?sep(^) and 
^Mix(^) have the same order. One can obtain this fact, for instance, from 
Lemma 7 of [4, Chapter 4], which states that (as the chain is reversible) 

d{t) < d,^p(t) , and d,^p{2t) < 1 - (l - d{t)f , 

where d(t) := maXv^jgn ||P.v(^? e - ^yiXt e Combining this with the 
sub-multiplicativity of d(t), and the fact that d(t) < d(t) < 2d(t) (see Defini- 
tion 3.1 in [4, Chapter 4]), we obtain that for any t, 

d{t) < d,^p{t) , and Jsep(80 < 2d(4t) < 32 {d(t))^ . 

This in turn implies that |?sep(^) < ?mix(|) < 4ep(|), as required. ■ 

5. Concluding remarks and open problems 

• As stated in Corollary 5, our results on continuous-time birth-and- 
death chains, combined with those of [10], imply that cutoff in total- 
variation distance is equivalent to separation cutoff for such chains. 
This raises the following question: 

Question 5.1. Let (X|"^) denote a family of irreducible reversible 
Markov chains, either in continuous-time or in lazy discrete-time. 
Is it true that there is cutoff in separation iff there is cutoff in total- 
variation distance (where the distance in both cases is measured 
from the worst starting position)? 

• One might assume that the cutoff-criterion (1 .4) also holds for close 
variants of birth- and-death chains. For that matter, we note that 
Aldous's example of a family of reversible Markov chains, which 
satisfies 4el = o(?mix(|)) and yet does not exhibit cutoff, can be writ- 
ten so that each of its chains is a biased random walk on a cycle. In 
other words, it suffices that a family of birth- and-death chains per- 
mits the one extra transition between states and n, and already the 
cutoff criterion (1.4) ceases to hold. 

• Finally, it would be interesting to characterize the cutoff criterion in 
additional natural families of ergodic Markov chains. 

Question 5.2. Does (1.4) hold for the family of lazy simple random 
walks on vertex transitive bounded-degree graphs? 
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